Computer and Modernization ›› 2013, Vol. 1 ›› Issue (5): 22-27.doi: 10.3969/j.issn.1006-2475.2013.05.006

• 算法设计与分析 • Previous Articles     Next Articles

Research on Data Skew Join Algorithm Based on MapReduce Model

JIN Jian, CHEN Qun, ZHAO Bao-xue   

  1. School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China
  • Received:2013-01-11 Revised:1900-01-01 Online:2013-05-28 Published:2013-05-28

Abstract: The study of join algorithm based on MapReduce is a hot topic in massive data research area. However, most current optimization work is based on the assumption that the data are evenly distributed. In practical applications, the data to be processed are often skew in distribution. This paper proposes a MapReduce join algorithm called Skew Control Join, which is adaptive for serious skew data. The algorithm gets the overall data distribution by sampling, then partitions the data by total partitioner to distribute the data evenly to all Reduce tasks. Experiment results show that the algorithm is of good performance when the processed data are skew.

Key words: join algorithm, data skew, total partition, sample

CLC Number: